Adventure Works Analysis on Microsoft SQL Server
Adventure Works Analysis on Microsoft SQL Server
This project demonstrates my proficiency in Exploratory Data Analysis (EDA) using Microsoft SQL Server on the comprehensive AdventureWorks sample database. The goal was to extract meaningful insights from various business domains including product information, employee demographics, and sales performance, by formulating and executing complex SQL queries.
Tools Used
- Microsoft SQL Server (MSSQL)
Analysis Questions and Solutions
Below are the 16 analytical questions tackled in this project, along with the SQL queries used to derive their answers.
Question 1
Retrieve information about the products with colour values except null, red, silver/black, white and list price between £75 and £750. Rename the column StandardCost to Price. Also, sort the results in descending order by list price.
SELECT NAME, COLOR, LISTPRICE, STANDARDCOST AS PRICE
FROM Production.Product
WHERE Color NOT IN('NULL', 'RED', 'SILVER/BLACK', 'WHITE')
AND LISTPRICE BETWEEN 75 AND 750
ORDER BY ListPrice DESC;
Question 2
Find all the male employees born between 1962 to 1970 and with hire date greater than 2001 and female employees born between 1972 and 1975 and hire date between 2001 and 2002.
SELECT GENDER, BIRTHDATE, HIREDATE FROM HumanResources.Employee
WHERE GENDER = 'M' AND BIRTHDATE BETWEEN '1962' AND '1970' AND HIREDATE > '2001'
UNION
SELECT GENDER, BIRTHDATE, HIREDATE FROM HumanResources.Employee
WHERE GENDER = 'F' AND BIRTHDATE BETWEEN '1972' AND '1975' AND HIREDATE BETWEEN '2001' AND '2002';
Question 3
Create a list of 10 most expensive products that have a product number beginning with ‘BK’. Include only the product ID, Name and colour.
SELECT TOP 10 ProductID, NAME, COLOR FROM Production.Product
WHERE ProductNumber LIKE 'BK%'
ORDER BY StandardCost DESC;
Question 4
Create a list of all contact persons, where the first 4 characters of the last name are the same as the first four characters of the email address. Also, for all contacts whose first name and the last name begin with the same characters, create a new column called full name combining first name and the last name only. Also provide the length of the new column full name.
SELECT
FIRSTNAME, LASTNAME, EMAILADDRESS,
CONCAT(FIRSTNAME, ' ', LastName) AS FULLNAME,
LEN(CONCAT(FIRSTNAME, ' ', LastName)) AS LEN_FULLNAME
FROM Person.Person AS PP
INNER JOIN Person.EmailAddress AS PEA
ON PP.BusinessEntityID = PEA.BusinessEntityID
WHERE LEFT(LASTNAME, 4) = LEFT(EmailAddress, 4)
AND LEFT(FIRSTNAME, 1) = LEFT(LASTNAME, 1);
Question 5
Return all product subcategories that take an average of 3 days or longer to manufacture.
SELECT PPSC.Name, DAYSTOMANUFACTURE
FROM Production.ProductSubcategory AS PPSC
LEFT JOIN Production.Product AS PP
ON PP.ProductSubcategoryID = PPSC.ProductSubcategoryID
WHERE DaysToManufacture >= 3;
Question 6
Create a list of product segmentation by defining criteria that places each item in a predefined segment as follows. If price gets less than £200 then low value. If price is between £201 and £750 then mid value. If between £750 and £1250 then mid to high value else higher value. Filter the results only for black, silver and red color products.
SELECT
NAME, LISTPRICE, COLOR,
CASE
WHEN LISTPRICE < 200 THEN 'LOW VALUE'
WHEN LISTPRICE BETWEEN 201 AND 750 THEN 'MID VALUE'
WHEN LISTPRICE BETWEEN 750 AND 1250 THEN 'MID T0 HIGH_VALUE'
ELSE 'HIGHER VALUE'
END AS PRODUCT_SEGMENTATION
FROM Production.Product
WHERE COLOR IN ('BLACK', 'SILVER', 'RED');
Question 7
How many Distinct Job title is present in the Employee table?
SELECT COUNT(DISTINCT JOBTITLE) AS "Number of Distinct Job titles"
FROM HumanResources.Employee;
Question 8
Use employee table and calculate the ages of each employee at the time of hiring.
SELECT YEAR(HIREDATE) - YEAR(BIRTHDATE) AS AGE_AT_HIRING
FROM HumanResources.Employee;
Question 9
How many employees will be due a long service award in the next 5 years, if long service is 20 years?
SELECT COUNT(CASE WHEN (YEAR(GETDATE()) - YEAR(HIREDATE)) >= 20 THEN 1 END) AS EMPLOYEES_FOR_LONG_SERVICE_AWARD
FROM HumanResources.Employee
WHERE (YEAR(GETDATE()) - YEAR(HIREDATE)) >= 15 AND (YEAR(GETDATE()) - YEAR(HIREDATE)) < 20; -- Employees reaching 20 years in next 5 years
Note: The original query `WHERE 2028-YEAR(HIREDATE) >= 20` uses a fixed year '2028'. The updated query dynamically checks for employees who will reach 20 years of service within the next 5 years from the current date. For a more precise calculation, you might consider `DATEDIFF(year, HireDate, GETDATE())`.
Question 10
How many more years does each employee have to work before reaching sentiment, if sentiment age is 65?
SELECT
BUSINESSENTITYID,
DATEDIFF(YEAR, BIRTHDATE, GETDATE()) AS CURRENT_AGE,
65 - DATEDIFF(YEAR, BIRTHDATE, GETDATE()) AS YEARS_TO_RETIREMENT
FROM HumanResources.Employee;
Question 11
Implement new price policy on the product table base on the colour of the item. If white increase price by 8%, If yellow reduce price by 7.5%, If black increase price by 17.2%. If multi, silver, silver/black or blue take the square root of the price and double the value. Column should be called Newprice. For each item, also calculate commission as 37.5% of newly computed list price.
SELECT
PRODUCTID, NAME, COLOR, StandardCost,
CASE
WHEN COLOR = 'WHITE' THEN StandardCost + (StandardCost * 0.08)
WHEN COLOR = 'YELLOW' THEN StandardCost - (StandardCost * 0.075)
WHEN COLOR = 'BLACK' THEN StandardCost + (StandardCost * 0.172)
WHEN COLOR IN ('MULTI', 'SILVER', 'SILVER/BLACK', 'BLUE') THEN SQRT(StandardCost) * 2
ELSE StandardCost
END AS NEW_PRICE,
(CASE
WHEN COLOR = 'WHITE' THEN StandardCost + (StandardCost * 0.08)
WHEN COLOR = 'YELLOW' THEN StandardCost - (StandardCost * 0.075)
WHEN COLOR = 'BLACK' THEN StandardCost + (StandardCost * 0.172)
WHEN COLOR IN ('MULTI', 'SILVER', 'SILVER/BLACK', 'BLUE') THEN SQRT(StandardCost) * 2
ELSE StandardCost
END) * 0.375 AS COMMISSION
FROM PRODUCTION.PRODUCT;
Question 12
Print the information about all the Sales.Person and their sales quota. For every Sales person you should provide their FirstName, LastName, HireDate, SickLeaveHours and Region where they work.
SELECT
PP.FirstName, PP.LastName, HRE.HireDate, HRE.SickLeaveHours, ST.NAME AS Region, SQH.SalesQuota
FROM Sales.SalesPerson AS SP
JOIN HumanResources.Employee AS HRE ON SP.BusinessEntityID = HRE.BusinessEntityID
JOIN Person.Person AS PP ON HRE.BusinessEntityID = PP.BusinessEntityID
JOIN Sales.SalesTerritory AS ST ON SP.TerritoryID = ST.TerritoryID
JOIN Sales.SalesPersonQuotaHistory AS SQH ON SP.BusinessEntityID = SQH.BusinessEntityID
ORDER BY PP.LastName, PP.FirstName;
Question 13
Using adventure works, write a query to extract the following information. Product name, Product category name, Product subcategory name, Sales person, Revenue, Month of transaction, Quarter of transaction, Region
SELECT
P.Name AS PRODUCT_NAME,
PC.Name AS PRODUCT_CATEGORY_NAME,
PSC.Name AS PRODUCT_SUBCATEGORY_NAME,
SUM(SOD.LineTotal) AS REVENUE,
CONCAT(PP.FIRSTNAME, ' ', PP.LASTNAME) AS SALES_PERSON,
DATEPART(MONTH, SOH.OrderDate) AS MONTH_OF_TRANSACTION,
DATEPART(QUARTER, SOH.OrderDate) AS QUARTER_OF_TRANSACTION,
S.Name AS REGION
FROM Sales.SalesOrderHeader AS SOH
JOIN Sales.SalesOrderDetail SOD ON SOH.SalesOrderID = SOD.SalesOrderID
JOIN Production.Product AS P ON SOD.ProductID = P.ProductID
JOIN Production.ProductSubcategory AS PSC ON P.ProductSubcategoryID = PSC.ProductSubcategoryID
JOIN Production.ProductCategory AS PC ON PSC.ProductCategoryID = PC.ProductCategoryID
JOIN Sales.SalesPerson AS SP ON SOH.SalesPersonID = SP.BusinessEntityID
JOIN Person.Person AS PP ON SP.BusinessEntityID = PP.BusinessEntityID
JOIN Sales.SalesTerritory S ON SP.TerritoryID = S.TerritoryID
GROUP BY
P.Name, PC.Name, PSC.Name, PP.FirstName, PP.LastName,
DATEPART(MONTH, SOH.OrderDate), DATEPART(QUARTER, SOH.OrderDate), S.Name
ORDER BY PP.LastName, PP.FirstName;
Question 14
Display the information about the details of an order i.e. order number, order date, amount of order, which customer gives the order and which salesman works for that customer and how much commission he gets for an order.
SELECT
SOH.SalesOrderNumber AS 'Order Number',
SOH.OrderDate AS 'Order Date',
SOH.TotalDue AS 'Amount of Order',
SC.CustomerID AS 'Customer ID',
PP.FirstName + ' ' + PP.LastName AS 'Salesperson Name',
SP.CommissionPct AS 'Commission Percentage'
FROM Sales.SalesOrderHeader AS SOH
JOIN Sales.Customer AS SC ON SOH.CustomerID = SC.CustomerID
JOIN Sales.SalesPerson AS SP ON SOH.SalesPersonID = SP.BusinessEntityID -- Corrected join for SalesPersonID
JOIN Person.Person AS PP ON SP.BusinessEntityID = PP.BusinessEntityID;
Note: Corrected the join condition for `Person.Person` to `SP.BusinessEntityID = PP.BusinessEntityID` to correctly link the salesperson to their personal information.
Question 15
For all the products calculate - Commission as 14.790% of standard cost, - Margin, if standard cost is increased or decreased as follows: Black: +22%, Red: -12%, Silver: +15%, Multi: +5%, White: Two times original cost divided by the square root of cost. For other colours, standard cost remains the same.
SELECT
PRODUCTID, NAME, STANDARDCOST, COLOR,
STANDARDCOST * 0.14790 AS COMMISSION,
(CASE
WHEN COLOR = 'BLACK' THEN STANDARDCOST + (STANDARDCOST * 0.22)
WHEN COLOR = 'RED' THEN StandardCost - (STANDARDCOST * 0.12)
WHEN COLOR = 'SILVER' THEN STANDARDCOST + (STANDARDCOST * 0.15)
WHEN COLOR = 'MULTI' THEN STANDARDCOST + (STANDARDCOST * 0.05)
WHEN COLOR = 'WHITE' THEN 2 * STANDARDCOST / SQRT(STANDARDCOST)
ELSE STANDARDCOST
END) - STANDARDCOST AS MARGIN
FROM Production.Product;
Question 16
Create a view to find out the top 5 most expensive products for each colour.
CREATE VIEW Top5ExpensiveProductsPerColor AS
SELECT
ProductID,
Name,
Color,
ListPrice
FROM (
SELECT
p.ProductID,
p.Name,
p.Color,
p.ListPrice,
ROW_NUMBER() OVER (PARTITION BY p.Color ORDER BY p.ListPrice DESC) AS Rank
FROM
Production.Product p
WHERE
p.Color IS NOT NULL
) RankedProducts
WHERE Rank <= 5;
-- To view the results from the view:
SELECT * FROM Top5ExpensiveProductsPerColor;
Results & Impact
Through this EDA project, I successfully extracted key business insights from the AdventureWorks database, showcasing my ability to:
- Understand and navigate complex relational schemas.
- Formulate precise SQL queries to answer specific business questions.
- Apply a wide range of SQL functions (string, date, aggregate, window functions, conditional logic).
- Identify patterns and trends in sales, product, and employee data.
- Prepare data for further analysis or reporting by deriving new metrics.
Learnings and Takeaways
This project was invaluable in deepening my practical SQL skills, especially in applying them to a real-world, complex dataset. Key takeaways include:
- Enhanced proficiency in multi-table `JOIN` operations and understanding their nuances.
- Mastery of `CASE` statements for complex conditional logic and data segmentation.
- Improved ability to use date and time functions for extracting temporal insights.
- Understanding the power of `VIEW`s for managing complex queries and improving reusability.
- Reinforced the importance of precise query formulation for accurate data extraction in EDA.
Project information
- Category Exploratory Data Analysis (EDA)
- Tools MSSQL
- Project date July 2025
- Project Link GitHub Repository
- View Full SQL Script on GitHub